Skip to main content

Custom Crawl Services

Internet Archive

Large-scale web harvests and national domain crawls performed for National Libraries, National Archives, preservation partners, research initiatives, and as part of special projects and custom crawling and research services.

91,577
RESULTS
rss


Media Type
106
collections
90,190
web
1,281
data
Year
5,778
2019
16,827
2018
16,125
2017
11,598
2016
6,546
2015
5,285
2014
More right-solid
Topics & Subjects
88,838
crawldata
45
web
9
nla
5
bne
4
2011
4
2012
More right-solid
Collection
More right-solid
Creator
82,749
internet archive
20
internet archive web group
3
dominic@archive.org
1
jefferson bailey
Language
10
English
1
Estonian
SHOW DETAILS
up-solid down-solid
eye
Title
Date Archived
Creator
National Library of Australia Crawls
collection
28,068
ITEMS
204.9M
VIEWS
collection
eye 204.9M
Crawls performed by Internet Archive on behalf of the National Library of Australia. This data is currently not publicly accessible.
National Library of Spain Crawls
collection
6,741
ITEMS
141.5M
VIEWS
collection
eye 141.5M
Data collected by Internet Archive on behalf of the National Library of Spain. This data is currently not publicly accessible.
Bibliotheque Nationale de France Domain Crawls
collection
1,652
ITEMS
100.3M
VIEWS
collection
eye 100.3M
Crawls of the french domain space performed by Internet Archive on behalf of Bibliotheque Nationale de France. This data is currently not publicly accessible.
Elections Web
collection
1,609
ITEMS
80.9M
VIEWS
collection
eye 80.9M
This collection contains collaborative Election crawls performed by IA.
Topics: elections, web
Election Crawl 2012
collection
1,608
ITEMS
80.9M
VIEWS
collection
eye 80.9M
This crawl was performed in Summer & Fall of 2012 to archive the US Federal Elections.
Topics: US, federal, elections, web, 2012
National Library of Australia Crawl
collection
4,650
ITEMS
58.4M
VIEWS
collection
eye 58.4M
National Library of Austrailia crawl. This data is currently not publicly accessible.
National Archives and Records Administration
collection
9,160
ITEMS
54.3M
VIEWS
collection
eye 54.3M
National Archives and Records Administration crawl performed by Internet Archive. This data is currently not publicly accessible.
bnf_2008
collection
715
ITEMS
49.2M
VIEWS
collection
eye 49.2M
this data is currently not publicly accessible.
Olympics Web
collection
2,064
ITEMS
38.5M
VIEWS
collection
eye 38.5M
This collection includes all collaborative Olympic crawls performed by IA for the IIPC.
Topics: olympics, IIPC, web
National Library of New Zealand Crawls
collection
10,920
ITEMS
37.9M
VIEWS
collection
eye 37.9M
Crawls performed by Internet Archive on behalf of the National Library of New Zealand. This data is currently not publicly accessible.
nls_2009
collection
874
ITEMS
32.1M
VIEWS
collection
eye 32.1M
this data is currently not publicly accessible.
nls_2010
collection
971
ITEMS
29.3M
VIEWS
collection
eye 29.3M
this data is currently not publicly accessible.
Olympics Crawl 2012
collection
702
ITEMS
28.5M
VIEWS
collection
eye 28.5M
These crawls were performed by IA on behalf of the IIPC in Summer 2012 during and prior to the 2012 Summer Olympics held in London, UK.
Topics: London, olympics, web, 2012, IIPC
National Library of Israel
collection
3,194
ITEMS
27.9M
VIEWS
by dominic@archive.org
collection
eye 27.9M
Data collected by Internet Archive on behalf of the National Library of Israel.  This data is currently not publicly accessible.
Topic: nlil
NLS_2011
collection
1,518
ITEMS
27.3M
VIEWS
collection
eye 27.3M
These crawls of the .es domain were performed in 2011 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2011
NLA 2013 Domain crawl
collection
2,824
ITEMS
23.7M
VIEWS
collection
eye 23.7M
This crawl of the .au domain was performed on behalf of the National Library of Australia in Spring of 2013.
Topics: nla, web, 2013
NLA_2015
collection
3,087
ITEMS
22.8M
VIEWS
collection
eye 22.8M
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2015.
Topics: nla, web, 2015
NLA_2014
collection
2,188
ITEMS
22.4M
VIEWS
collection
eye 22.4M
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2014.
Topics: nla, web, 2014
bnf_2007
collection
321
ITEMS
21M
VIEWS
collection
eye 21M
this data is currently not publicly accessible.
collection
eye 20.7M
Topics: bne, spain, web, 2013
nlaweb2016
collection
3,590
ITEMS
20.4M
VIEWS
collection
eye 20.4M
This crawl of the .au domain was performed on behalf of the National Library of Australia in of 2016. 
Topics: nla, australia, web
NARA 112th Congressional Crawl
collection
705
ITEMS
19.8M
VIEWS
collection
eye 19.8M
This crawl of online resources of the 112th US Congress was performed in Fall of 2012 and early winter of 2013 on behalf of NARA.
Topics: nara, 112th, web
nla_2008
collection
631
ITEMS
19.8M
VIEWS
collection
eye 19.8M
this data is currently not publicly accessible.
NLS_2012
collection
776
ITEMS
18.9M
VIEWS
collection
eye 18.9M
This crawl of the .es domain was performed in 2012 on behalf of the National Library of Spain (BNE).
Topics: bne, spain, web, 2012
nla_2009
collection
567
ITEMS
16.8M
VIEWS
collection
eye 16.8M
this data is currently not publicly accessible.
bnf_2005
collection
265
ITEMS
15.4M
VIEWS
collection
eye 15.4M
this data is currently not publicly accessible.
IMLS Museum Universe Data File Crawl
collection
2,876
ITEMS
15.3M
VIEWS
collection
eye 15.3M
2015 crawl of museum websites listed in the IMLS Museum Universe Data File. More about the IMLS MUDF can be found at https://www.imls.gov/research-evaluation/data-collection/museum-universe-data-file
Topic: AIT
nla_2007
collection
371
ITEMS
15.3M
VIEWS
collection
eye 15.3M
this data is currently not publicly accessible.
National Library of Sweden
collection
309
ITEMS
15.2M
VIEWS
collection
eye 15.2M
Data collected by Internet Archive on behalf of the National Library of Sweden. This data is currently not publicly accessible.
nl_sweden_2010
collection
308
ITEMS
15.2M
VIEWS
collection
eye 15.2M
this data is currently not publicly accessible.
nla_2006
collection
384
ITEMS
14.9M
VIEWS
collection
eye 14.9M
this data is currently not publicly accessible.
NLA 2017 Domain Crawl
collection
4,875
ITEMS
14.7M
VIEWS
collection
eye 14.7M
Crawls performed by the Internet Archive in 2017 on behalf of the National Library of Australia.
Topic: nla web 2017
NARA 114th Congressional Crawl
collection
3,615
ITEMS
14.4M
VIEWS
collection
eye 14.4M
This crawl of online resources of the 114th US Congress was performed on behalf of The United States National Archives & Records Administration (NARA).
NLIL_2013
collection
1,187
ITEMS
14.2M
VIEWS
by dominic@archive.org
collection
eye 14.2M
This crawl of the .il domain was performed in 2013 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2013
bnf_2006
collection
323
ITEMS
13.8M
VIEWS
collection
eye 13.8M
this data is currently not publicly accessible.
nla_2005
collection
175
ITEMS
12.8M
VIEWS
collection
eye 12.8M
this data is currently not publicly accessible.
National Library of Luxembourg
collection
5,296
ITEMS
12.3M
VIEWS
collection
eye 12.3M
National Library of Luxembourg
Topic: Luxembourg
National Library of Ireland Crawls
collection
2,620
ITEMS
12M
VIEWS
collection
eye 12M
Crawls performed by Internet Archive on behalf of the National Library of Ireland. This data is currently not publicly accessible.
IMLS Museum Universe 00001
collection
2,273
ITEMS
11.6M
VIEWS
collection
eye 11.6M
Crawl 00001 of the IMLS Museum Universe Date File.
nlnzweb2013
collection
921
ITEMS
10.7M
VIEWS
collection
eye 10.7M
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in February 2013.
Topics: web, domain
nlnzweb2016
collection
1,513
ITEMS
10.4M
VIEWS
collection
eye 10.4M
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2016.
Topics: new zealand, web, domain
NARA 111th Congressional Crawl
collection
214
ITEMS
8.5M
VIEWS
collection
eye 8.5M
This crawl of online resources of the 111th Congress of the United States was performed in Fall of 2010 and Winter of 2011 on behalf of NARA.
Topics: nara, 111th, congress, web
Biblioteca Nazionale Centrale di Firenze
collection
223
ITEMS
8.5M
VIEWS
collection
eye 8.5M
Data collected by Internet Archive on behalf of Biblioteca Nazionale Centrale di Firenze. This data is currently not publicly accessible.
Fed Site Closure Crawls
collection
1,856
ITEMS
7.9M
VIEWS
collection
eye 7.9M
These are crawls performed on US Federal Government Web sites prior to their removal or merge with other resources.
Topics: federal, web, closures
Fed Site Closures 2011
collection
1,855
ITEMS
7.9M
VIEWS
collection
eye 7.9M
This crawl was performed in Fall of 2011 to archive Federal government web sites that were either slated for removal or for merger with other online resources.
Topics: federal, web, 2011
NLS_elec2011
collection
280
ITEMS
7.6M
VIEWS
collection
eye 7.6M
This crawl was performed on behalf of the National Library of Spain (BNE) in Fall of 2011 to archive the National elections in Spain.
Topics: elections, web, 2011, spain, bne
Olympics Crawl 2014
collection
1,338
ITEMS
7.6M
VIEWS
collection
eye 7.6M
These crawls were performed by IA on behalf of the IIPC in Winter 2014 during and prior to the 2014 Winter Olympics and Paralympic Games held in Sochi, Russia.
Topics: olympics 2014, web, sport, olympic games
NLA 2018 Domain Crawl
collection
5,641
ITEMS
7.1M
VIEWS
collection
eye 7.1M
Crawls performed by the Internet Archive in 2018 on behalf of the National Library of Australia.
Topics: nla, web, 2018
NLA_2010
collection
179
ITEMS
7.1M
VIEWS
collection
eye 7.1M
This crawl was a domain scale harvest of .au performed for the National Library of Australia in 2010.
Topics: nla, web, 2010
NLIL_2014
collection
971
ITEMS
7M
VIEWS
by dominic@archive.org
collection
eye 7M
This crawl of the .il domain was performed in 2014 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2014
NLIL_2015
collection
1,033
ITEMS
6.8M
VIEWS
collection
eye 6.8M
This crawl of the .il domain was performed in 2015 on behalf of the National Library of Israel (NLIL).
Topics: nlil, israel, web, 2015
nlnzweb2015
collection
1,071
ITEMS
6.7M
VIEWS
collection
eye 6.7M
This collection includes content harvested from the Web on behalf of the National Library & Archives New Zealand in January 2015.
Topics: new zealand, web, domain
collection
eye 6.1M
Crawl of the Ireland web domain, .ie, performed for the National Library of Ireland in 2007. This data is currently not publicly accessible.
NLNZ Domain Crawl 2018
collection
3,539
ITEMS
6M
VIEWS
collection
eye 6M
Domain crawl of the New Zealand web domain (.nz) performed by Internet Archive on behalf of the National Library of New Zealand in January-February, 2018.
Topics: web, nlnz, 2018
NLS_humanidades
collection
296
ITEMS
5.6M
VIEWS
collection
eye 5.6M
This crawl was performed in 2011 and 2012 on behalf of the National Library of Spain (BNE) to archive digital humanities web sites and online resources in Spain.
Topics: bne, spain, web, humanities, humanidades, 2011, 2012
National Libary of Ireland 2017 Web Archive
collection
2,510
ITEMS
5.2M
VIEWS
collection
eye 5.2M
2017 domain crawl for National Library of Ireland.
Topics: ireland, web
nlnz_2010
collection
167
ITEMS
5.2M
VIEWS
collection
eye 5.2M
this data is currently not publicly accessible.
NLNZ Spring 2017 Domain Crawl
collection
2,389
ITEMS
4.9M
VIEWS
collection
eye 4.9M
This crawl of the .nz domain was performed on behalf of the National Library of New Zealand in Spring of 2017.
Topics: nlnz, web, 2017
Internet Archive Research Publication Crawls
collection
6,454
ITEMS
4.8M
VIEWS
by Internet Archive Web Group
collection
eye 4.8M
A series of open web crawls targeting journal articles, technical memos, essays, datasets, and other research publications. This collection contains WARC and CDX files that end up in Wayback ( https://web.archive.org ). See also bibliographic metadata corpuses at  https://archive.org/details/ia_biblio_metadata
National Library of Luxembourg Crawl Fall 2016
collection
817
ITEMS
4.7M
VIEWS
collection
eye 4.7M
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg  in September and October of 2016.
Topic: Luxembourg
NLAgov_2010
collection
624
ITEMS
4.1M
VIEWS
collection
eye 4.1M
This crawl was performed on the .gov.au domain in 2010 on behalf of the National Library of Australia.
Topics: nla, gov.web, 2010
IMLS Museum Universe 00000
collection
601
ITEMS
3.7M
VIEWS
collection
eye 3.7M
Crawl 00000 of the IMLS Museum Universe Date File.
Topic: museum imls universe
UNT Web
collection
35
ITEMS
3.3M
VIEWS
collection
eye 3.3M
This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
by Internet Archive
collection
eye 2.8M
This collection includes all resources harvested from the online presence of the Legislative branch of the US Federal government as part of the NARA 112th Congressional Web Harvest Test Crawl. The crawl was performed from October 16th through November 5th 2012.
Topics: NARA, 112th, Congress
Indonesia 2017 Domain Crawl
collection
667
ITEMS
2.6M
VIEWS
collection
eye 2.6M
Crawls performed by the Internet Archive of the .id (Indonesia) web domain. This data is not currently publicly accessible.
Topics: web, 2017
Olympics Crawl 2010
collection
21
ITEMS
2.5M
VIEWS
collection
eye 2.5M
These crawls were performed by IA on behalf of the IIPC in Winter 2010 during and prior to the 2010 Winter Olympics held in Vancouver, BC, Canada.
Topics: winter, olympics, 2010, IIPC, web
collection
eye 2.4M
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg from December 2016 to January 2017.
Topic: Luxembourg
BNL 2017 Summer Domain Crawl
collection
944
ITEMS
2.4M
VIEWS
collection
eye 2.4M
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg  in June and July of 2017.
Topics: BNL, web, 2017
BNL 2017 Winter Domain Crawl
collection
1,181
ITEMS
2.4M
VIEWS
collection
eye 2.4M
Domain crawl of the Luxembourg web domain (.lu) performed by Internet Archive on behalf of the National Library of Luxembourg / Bibliothèque nationale de Luxembourg in December 2017 and January 2018.
Topics: web, 2017, 2018, luxembourg, BNL
MSAG-PDF-CRAWL-2017
collection
1,855
ITEMS
2.3M
VIEWS
by Internet Archive Web Group
collection
eye 2.3M
Microsoft Academic Graph public corpus (Feb 2016) PDF URLs, filtered to remove large sites (pubmed, citeseerx, arxiv) and already-crawled URLs.
Topics: papers, journals
nlnz_2008
collection
97
ITEMS
2.3M
VIEWS
collection
eye 2.3M
this data is currently not publicly accessible.
collection
eye 1.9M
Data collected by Internet Archive on behalf of the Fundacao para a Computacao Cientifica Nacional of Portugal. This data is currently not publicly accessible.
NDIIPP Youtube Crawl
collection
90
ITEMS
1.7M
VIEWS
collection
eye 1.7M
Youtube crawl performed by Internet Archive on behalf of the National Digital Internet Infrastructure Preservation Program. This data is currently not publicly accessible.
collection
eye 1.6M
Crawl of the Ireland web domain, .ie, performed for the National Library of Ireland in 2007. This data is currently not publicly accessible.
Nara 110th Congressional Crawl
collection
107
ITEMS
1.5M
VIEWS
collection
eye 1.5M
The end of term harvest of the 110th Congress of the United States was performed on behalf of NARA in Fall of 2008 and early winter of 2009.
Topics: nara, 110th, congress, web